Solution: Use Columns Unambiguously

Let’s discover how we can solve the antipattern using unambiguous queries.

We'll cover the following

Query only functionally dependent columns
Using a correlated subquery
Using a derived table
Using a JOIN
Using an aggregate function for extra columns
Concatenating all values per group

The sections that follow describe several ways we can resolve this antipattern and write unambiguous queries.

Query only functionally dependent columns#

The most straightforward solution is to eliminate ambiguous columns from the query.

Enter to Rename, Shift+Enter to Preview

Retrieving the latest reported date of a bug by using a straightforward solution

The query reveals the date of the latest bug per product, even though it doesn’t report the bug_id corresponding to that latest bug. Sometimes this is enough, and we shouldn’t overlook this simple solution.

Using a correlated subquery#

A correlated subquery contains a reference to the outer query and produces different results for each outer query row. We can use this to find the latest bug per product by running a subquery to search for bugs with the same product and a greater date. When the subquery finds none, the bug in the outer query is the latest.

Enter to Rename, Shift+Enter to Preview

Retrieving the latest reported date of a bug using correlated subquery

This is a simple solution that is readable and easy to code. However, we must remember that this solution isn’t likely to be the best for performance because correlated subqueries are executed once for each row of the outer query.

Using a derived table#

We can use a subquery as a “derived table”, producing an interim result that contains only the product_id and the corresponding greatest bug report date for each product. Then, we can use this result to join against the tables so that the query result contains only bugs with the latest date per product.

Enter to Rename, Shift+Enter to Preview

Retrieving the latest reported date of a bug using a derived table

Under this solution, we can get multiple rows per product if the latest date returned by the subquery matches multiple rows. If we need to ensure a single row per product_id, we can use another grouping function in the outer query:

Enter to Rename, Shift+Enter to Preview

Retrieving the latest reported date of a bug using a derived table while ensuring a single row per product

We can use the derived table solution as a more scalable alternative to the correlated subquery. The derived table is non-correlated, so most database brands should be able to execute the subquery once. However, the database must store the interim result set in a temporary table, so this solution still isn’t the best for performance.

Using a `JOIN`#

We can create a join that tries to match against a set of rows that may not exist. This type of join is called an outer join. Where the matching rows don’t exist, NULL is used for all columns in that nonexistent row. So, where the query finds NULL, we know that no such row was found.

SELECT bp1.product_id, b1.date_reported AS latest, b1.bug_id
FROM Bugs b1 JOIN BugsProducts bp1 ON (b1.bug_id = bp1.bug_id)
LEFT OUTER JOIN (Bugs AS b2 JOIN BugsProducts AS bp2 ON (b2.bug_id = bp2.bug_id))
ON (bp1.product_id = bp2.product_id AND (b1.date_reported < b2.date_reported
OR b1.date_reported = b2.date_reported AND b1.bug_id < b2.bug_id))
WHERE b2.bug_id IS NULL;

Enter to Rename, Shift+Enter to Preview

Retrieving the latest reported date of a bug using a JOIN

It takes a few minutes of gazing at this query, and perhaps some doodles on notepaper, for most people to see how it works. But once we do, this technique can be an important tool.

We use the JOIN solution when the scalability of the query over large sets of data is important. Although it’s a tougher concept to grasp and therefore more difficult to maintain, it often scales better than a subquery-based solution. We must remember to measure the performance of several query forms instead of assuming that one performs better than the other.

Using an aggregate function for extra columns#

We can make the extra column comply with the Single-Value Rule by applying another aggregate function to it.

Enter to Rename, Shift+Enter to Preview

Retrieving the latest reported date of a bug using another aggregate function

We use this solution only when we can rely on the latest bug_id being the bug with the latest date, that is, if the bugs are guaranteed to be reported in chronological order.

Concatenating all values per group#

Finally, we can use another aggregate function on bug_id to avoid violating the Single-Value Rule. MySQL and SQLite support a function GROUP_CONCAT() that concatenates all the values in the group into one value. By default, this is a comma-separated string.

Enter to Rename, Shift+Enter to Preview

Concatenating all values per group using GROUP_CONCAT()

This query doesn’t reveal which bug_id corresponds to the latest date; the bug_id_list includes all bug_id values in each group.

Another disadvantage of this solution is that it isn’t standard SQL, and other brands of databases don’t support this function. Some brands of databases support custom functions and custom aggregate functions. For example, here’s the solution for PostgreSQL:

/

main.sql

main.sql

Enter to Rename, Shift+Enter to Preview

Custom aggregate functions in PostgreSQL for concatenating values

Some other brands of database don’t support custom functions, so the solution may require writing a stored procedure to loop over a non-grouped query result, concatenating values manually.

We can use this solution when we expect the extra column to have a single value per group but the column still violates the Single-Value Rule.

Antipattern: Reference Non-grouped Columns

Synopsis: Random Selection

Mark as Completed

Report an Issue

Introduction

Logical Antipattern - Jaywalking

Logical Antipattern - Naive Trees

Logical Antipattern - ID Required

Logical Antipattern - Keyless Entry

Assessment: Logical Antipatterns - Part 1

Logical Antipattern - Entity-Attribute-Value

Logical Antipattern - Polymorphic Associations

Logical Antipattern - Multicolumn Attributes

Logical Antipattern - Metadata Tribbles

Assessment: Logical Antipatterns - Part 2

Physical Antipattern - Rounding Errors

Physical Antipattern - 31 Flavors

Physical Antipattern - Phantom Files

Physical Antipattern - Index Shotgun

Assessment: Physical Antipatterns

Query Antipattern - Fear of the Unknown

Query Antipattern - Ambiguous Groups

Query Antipattern - Random Selection

Query Antipattern - Poor Man’s Search Engine

Query Antipattern - Spaghetti Query

Query Antipattern - Implicit Columns

Assessment: Query Antipattern

Application Development Antipattern - Readable Passwords

Application Development Antipattern - SQL Injection

Application Development Antipattern - Pseudokey Neat-Freak

Application Development Antipattern - See No Evil

Application Development Antipattern - Diplomatic Immunity

Application Development Antipattern - Magic Beans

Assessment: Application Development Antipattern

Conclusion

Solution: Use Columns Unambiguously

Query only functionally dependent columns#

Using a correlated subquery#

Using a derived table#

Using a `JOIN`#

Using an aggregate function for extra columns#

Concatenating all values per group#

/

Solution: Use Columns Unambiguously

Query only functionally dependent columns#

Using a correlated subquery#

Using a derived table#

Using a JOIN#

Using an aggregate function for extra columns#

Concatenating all values per group#

/

Using a `JOIN`#